Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on $\ell_p$-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.
translated by 谷歌翻译
多目标优化(MOO)旨在同时优化多个冲突的目标,并在机器学习中发现了重要的应用,例如最大程度地减少分类损失和差异,以在处理不同的人群方面以保持公平。最佳性,进一步优化一个目标至少将至少损害另一个目标,而决策者需要全面探索多个Optima(称为Pareto Front),以确定一个最终解决方案。我们解决了寻找帕累托阵线的效率。首先,使用随机多偏差下降(SMGD)从头开始寻找前部,对于大型神经网络和数据集很昂贵。我们建议基于预测器 - 校正方法来探索帕累托阵线作为一些初始Optima的歧管。其次,对于每个探索步骤,预测变量求解一个大规模的线性系统,该系统在模型参数数量中二次缩放,并且需要一个反向传播来评估求解器的二阶Hessian-vector产品。我们提出了一个只能线性缩放的高斯 - 纽顿近似,并且只需要每次迭代的一阶内产物。这还允许在大约求解线性系统时,在微小和共轭梯度方法之间进行选择。这些创新使大型网络成为可能的预测器 - 校准。关于多目标(公平和准确性)错误信息检测任务的实验表明,1)预测器 - 矫正器方法可以在更少的时间内找到比或与SMGD更好或与SMGD相似的方法; 2)提出的一阶方法不会损害二阶方法识别的帕累托前沿的质量,同时进一步缩短了运行时间。
translated by 谷歌翻译
图形神经网络(GNN)在各种高桩预测任务中实现了最先进的性能,但是具有不规则结构的图表上的多层聚合使得GNN成为一种更不可解释的模型。先前的方法使用更简单的子图来模拟完整模型,或识别预测原因的完整模型或反事实。这两个方法旨在瞄准两个不同的目标,“模拟性”和“反事实相关”,但目前尚不清楚目标如何共同影响人类理解解释。我们设计用户学习,以调查这些关节效果,并使用该研究结果设计多目标优化(MOO)算法,以查找帕累托最佳解释,可在模拟性和反事实方面得到良好平衡。由于目标模型可以是任何GNN变体,并且由于隐私问题可能无法访问,因此我们使用零顺序信息设计一个搜索算法而不访问目标模型的架构和参数。来自四个应用的九个图表的定量实验表明,帕累托有效的解释主导使用一阶连续优化或离散组合搜索的单目标基线。在鲁棒性和敏感性中进一步评估了解释,以表明他们揭示令人信服的令人信服的能力,同时对可能的混乱持谨慎态度。各种主导的反事件可以证明算法追索权的可行性,这可能促进人类参与使用GNN决策的算法公平性。
translated by 谷歌翻译
图表在许多应用中普遍存在,例如社交网络,知识图形,智能电网等。图形神经网络(GNN)是这些应用的当前最先进的,但对人类来说仍然是模糊的。解释GNN预测可以添加透明度。然而,随着许多图表不是静态而是不断发展,解释了两个图形快照之间的预测的变化是不同的,而同样重要的。现有方法仅解释静态预测或生成用于动态预测的粗略或无关的解释。我们定义解释不断发展的GNN预测的问题,并提出了一种唯一地将预测的改变唯一地分解到计算图中的路径。涉及高度节点的许多路径的归属仍然不可解释,同时简单地选择顶部的重要路径可以是近似变化的次优。我们制定了一种新颖的凸优化问题,以最佳地选择解释预测演化的路径。从理论上讲,我们证明了基于层相关性 - 传播(LRP)的现有方法是当与空图进行比较时所提出的算法的特殊情况。经验上,在七个图形数据集上,具有用于评估预测变化的解释的新型度量,我们展示了所提出的方法对现有方法的优越性,包括LRP,DEEPLIFT和其他路径选择方法。
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译
A step-search sequential quadratic programming method is proposed for solving nonlinear equality constrained stochastic optimization problems. It is assumed that constraint function values and derivatives are available, but only stochastic approximations of the objective function and its associated derivatives can be computed via inexact probabilistic zeroth- and first-order oracles. Under reasonable assumptions, a high-probability bound on the iteration complexity of the algorithm to approximate first-order stationarity is derived. Numerical results on standard nonlinear optimization test problems illustrate the advantages and limitations of our proposed method.
translated by 谷歌翻译
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
Considering the computation complexity, we propose a Guided Hybrid Quantization with One-to-one Self-Teaching (GHOST}) framework. More concretely, we first design a structure called guided quantization self-distillation (GQSD), which is an innovative idea for realizing lightweight through the synergy of quantization and distillation. The training process of the quantization model is guided by its full-precision model, which is time-saving and cost-saving without preparing a huge pre-trained model in advance. Second, we put forward a hybrid quantization (HQ) module to obtain the optimal bit width automatically under a constrained condition where a threshold for distribution distance between the center and samples is applied in the weight value search space. Third, in order to improve information transformation, we propose a one-to-one self-teaching (OST) module to give the student network a ability of self-judgment. A switch control machine (SCM) builds a bridge between the student network and teacher network in the same location to help the teacher to reduce wrong guidance and impart vital knowledge to the student. This distillation method allows a model to learn from itself and gain substantial improvement without any additional supervision. Extensive experiments on a multimodal dataset (VEDAI) and single-modality datasets (DOTA, NWPU, and DIOR) show that object detection based on GHOST outperforms the existing detectors. The tiny parameters (<9.7 MB) and Bit-Operations (BOPs) (<2158 G) compared with any remote sensing-based, lightweight or distillation-based algorithms demonstrate the superiority in the lightweight design domain. Our code and model will be released at https://github.com/icey-zhang/GHOST.
translated by 谷歌翻译
Automatic font generation without human experts is a practical and significant problem, especially for some languages that consist of a large number of characters. Existing methods for font generation are often in supervised learning. They require a large number of paired data, which are labor-intensive and expensive to collect. In contrast, common unsupervised image-to-image translation methods are not applicable to font generation, as they often define style as the set of textures and colors. In this work, we propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++). We introduce a feature deformation skip connection (FDSC) to learn local patterns and geometric transformations between fonts. The FDSC predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level content feature maps. The outputs of FDSC are fed into a mixer to generate final results. Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts. To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. In addition to adversarial loss, another two reconstruction losses are adopted to constrain the domain-invariant characteristics between generated images and content images. Taking advantage of FDSC and the adopted loss functions, our model is able to maintain spatial information and generates high-quality character images in an unsupervised manner. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
translated by 谷歌翻译
Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.
translated by 谷歌翻译